Background & Overview

Various integrated circuits (ICs) used in electronic products are manufactured on semiconductor wafers (Fig. 1), where each die on the wafer goes through a series of tests at the end of the manufacturing process to determine if it is good for shipment or not (called wafer sort/binning). The percentage of die that is good for shipment is called “Yield”.

In this example, we illustrate a case where a team has an yield issue, identifies the underlying root-cause using data analysis and visualizations, designs an experiment to solve the problem, and successfully verifies that they have improved the yield/process metrics using hypothesis testing.

Fig 1. Semiconductor Wafers, Source: wikipedia

Fig 1. Semiconductor Wafers, Source: wikipedia

In the following sections, we cover the following:

Yield Definition and Criteria

In this example, each part is tested with 6 tests (T01 to T06). A good die is defined as a die that passes all 6 tests. The passing criteria for the tests are as displayed in the table below.

TestName LowerLimit UpperLimit Units
T01_RES NA 1.0e+02 mOhm
T02_VTH 0.6 1.2e+00 Volts
T03_IOFF NA 1.0e-07 Amps
T04_IG_3V NA 5.0e-04 Amps
T05_IG_4V NA 5.0e-04 Amps
T06_IG_5V NA 5.0e-04 Amps
Note:
NA ==> Limit Not Applicable

Generating mock-data using python

This notebook is based on a fictitious semiconductor dataset that I have generated. While the general characteristics of the data are realistic, they don’t correspond to any real process/technology.

We will use Python to generate the mock data, while the visualizations and rest of the analysis will be done in R, just for illustration.

Import the necessary python libraries:

Generate a wafer-like X/Y pattern:

Function to generate mock test-data:

# The function below generates the mock test data using the Mean and Standard Deviation
# values provided for some of the tests. Some of the calculations may seem complicated 
# as I am trying to replicate some realistic behavior
def generate_wafer_data(df_waferXY, LotID = 'tmp', Wafers = [1], rMean = 80, 
                radialFactor = 0.3, rSD = 5, vMean = 0.75, vSD = 0.05, ioffMean = -8):
                        
    # Concatenate the wafer data multiple times to account for multiple wafers
    df = pd.concat([df_wafer_XY]*len(Wafers),ignore_index=True)
    df['LotID'] = LotID
    df['Wafer'] = np.repeat(Wafers,df_wafer_XY.shape[0])
    
    # Generate normally distributed data with additional componenet increasing with radius
    df['T01_RES'] = ((np.random.normal(rMean, rSD, df.shape[0])*
                          (1 + radialFactor * (df['radius'] / df['radius'].max())**2)) + 
                            ((np.random.normal(0, rSD/5, df.shape[0])**2)*
                              (1 + radialFactor * (df['radius'] / df['radius'].max()))**6)
                    )
                     
    # Hardcoding extreme outliers to simulate machine error codes etc. 
    df.loc[df['T01_RES'] > 120,'T01_RES'] = 10000
    
    # Generate normally distributed data with additional componenet that increases with radius
    df['T02_VTH'] = (np.random.normal(vMean, vSD, df.shape[0])*
                    (1 + radialFactor * (df['radius'] / df['radius'].max())**2))
    
    # The leakages generally tend to vary in orders of magnitude. So, they are simulated as 10^X, 
    # where X is a normally distributed random variable 
    
    df['T03_IOFF'] = np.power(10,np.random.normal(ioffMean, 0.5, df.shape[0]))
    df['T04_IG_3V'] = np.power(10,np.random.normal(-6,0.4,df.shape[0]))
    df['T05_IG_4V'] = np.power(10,np.random.normal(-5,0.4,df.shape[0]))
    df['T06_IG_5V'] = np.power(10,np.random.normal(-4,0.4,df.shape[0]))
    
    # "Pass" column corresponds to die passing all the tests, based on the limits above. 
    df['Pass'] = ((df['T01_RES'] < 100) & (df['T02_VTH'] > 0.6) & (df['T02_VTH'] < 1.2) & 
                  (df['T03_IOFF'] < 1e-7) & (df['T04_IG_3V'] < 5e-4) & 
                  (df['T05_IG_4V'] < 5e-4) & (df['T06_IG_5V'] < 5e-4))
    return(df)
# Generate the intial data to be analyzed
df_baseline_py = generate_wafer_data(df_wafer_XY, LotID = 'Old_1', Wafers = range(1,7), 
                      radialFactor = 0.3, ioffMean = -7.5)

Data Exploration for understanding the cause of low yield

We are told that our current baseline process is yielding low and are tasked with improving the yield. The test data from latest lot (data frame generated in python above), and the test limits (as listed in the table above) are provided. As the technology development team members, we are supposed to come up with ideas and improve the process.

Since we will be doing all the visualizations in R, let us first copy the dataframe generated in Python into the R environment for easier access. Also display the first few rows of the data as a sample.

##     X   Y   radius LotID Wafer   T01_RES   T02_VTH     T03_IOFF
## 1 -96 -24 98.95454 Old_1     1 107.67884 1.0609852 3.560641e-08
## 2 -96 -18 97.67292 Old_1     1 101.04946 0.9285851 9.129461e-09
## 3 -96 -12 96.74709 Old_1     1  98.19720 0.9937547 2.266713e-08
## 4 -96  -6 96.18732 Old_1     1  87.37770 0.9850032 3.749880e-08
## 5 -96   0 96.00000 Old_1     1  99.41267 0.9639424 2.765939e-08
## 6 -96   6 96.18732 Old_1     1 101.13474 0.9890429 7.863498e-08
##      T04_IG_3V    T05_IG_4V    T06_IG_5V  Pass
## 1 6.276649e-07 3.730211e-05 9.463631e-05 FALSE
## 2 3.521231e-07 4.515430e-05 1.766997e-04 FALSE
## 3 8.925451e-07 1.131953e-05 2.581176e-04  TRUE
## 4 8.656977e-07 8.065714e-05 1.238676e-04  TRUE
## 5 5.822397e-07 9.874323e-06 1.756241e-04  TRUE
## 6 1.030515e-06 2.144699e-06 1.318054e-04 FALSE

Let us now check the yields of each wafer as indicated by the ‘Pass’ column value for each die:

The next step is to figure out which tests are contributing to the yield loss using the limits provided before. Let us look at what percentage of dies are passing each test in each wafer. It is always good to look at the yields by wafer along with yields for all the wafers together as there could be some wafer to wafer variations in the manufacturing.

# The test names are assigned as the row names for convinience
row.names(df_limits) <- df_limits$TestName

# The function below generates a tile plot of yield by category for each wafer
# taking the test data and test limits as inputs

plot_yield_tile <- function(df_limts, df_test_data){
  
  # Generate a separate Pass/Fail column for each test
  for(i in 1:nrow(df_limits)){
    df_test_data[paste0('Pass_',df_limits$TestName[i])] <- (
      (is.na(df_limits$UpperLimit[i]) | 
         (df_test_data[paste0(df_limits$TestName[i])] < df_limits$UpperLimit[i]))  & 
      (is.na(df_limits$LowerLimit[i]) | 
         (df_test_data[paste0(df_limits$TestName[i])] > df_limits$LowerLimit[i]))
      )
  }
  
  # Calculate yield grouped by category and wafer
  df_category_yield <- df_test_data[c('LotID','Wafer',
                                      grep('Pass',colnames(df_test_data),value = T))] %>% 
                          melt(id.vars = c('LotID','Wafer'), variable.name = 'Category') %>%
                          group_by(LotID,Wafer,Category) %>% 
                          summarise(Yield_Percent = sum(value)/n(), NumDies = n())
  
  # Cleaning up the category names
  df_category_yield$Category <- gsub('Pass$','Final_Yield',df_category_yield$Category)
  
  # Generating the tile plot
  pyield <- ggplot(df_category_yield, aes(x = Wafer, y = Category, fill = Yield_Percent)) + 
            geom_tile(color = 'black') + 
            geom_text(aes(label = scales::percent(Yield_Percent, accuracy = 1)), size = 5) + 
            scale_fill_gradient(low = 'red', high = 'green', 
                                limits = c(0.2,1), labels = scales::percent) + 
            scale_x_continuous(expand=c(0,0),breaks = (1:6)) + 
            theme(axis.text = element_text(size = 12), axis.title = element_text(size = 16)) + 
            labs(title = 'Percentage of dies passing each test in each wafer')
  
  return(pyield)
}

print(plot_yield_tile(df_limts, df_baseline))

From the above, we can conclude the following:

Another way to look at yield losses is to look at a yield pareto, which would list the yield loss categories from most frequent to least frequent. Since we already know that there is no significant wafer to wafer variation, we will look at yield loss for the whole lot to give us a bigger picture.

# The function below genrates a yield pareto chart which displays yield loss categories
# from most significant to least significant as a bar chart

plot_yield_pareto <- function(df_limts, df_test_data){
  
  # Generate a separate Pass/Fail column for each test
  for(i in 1:nrow(df_limits)){
    df_test_data[paste0('Pass_',df_limits$TestName[i])] <- (
      (is.na(df_limits$UpperLimit[i]) | 
         (df_test_data[paste0(df_limits$TestName[i])] < df_limits$UpperLimit[i]))  & 
      (is.na(df_limits$LowerLimit[i]) | 
         (df_test_data[paste0(df_limits$TestName[i])] > df_limits$LowerLimit[i]))
      )
  }
  
  # For the pareto we need to identify the first test which fails for each die
  # If none of the tests fail, the die is marked as "Pass"
  df_tests_inverse <- !(df_test_data[grep('Pass_T',colnames(df_test_data), value = T)])
  df_test_data$FirstFailTest <- colnames(df_tests_inverse)[ifelse(
                              rowSums(df_tests_inverse)==0, NA,max.col(df_tests_inverse, "first"))]
  df_test_data$FirstFailTest <- ifelse(is.na(df_test_data$FirstFailTest),
                                       'Pass',gsub('Pass_','',df_test_data$FirstFailTest))
  
  # Generate the plot
  pareto <- df_test_data %>% group_by(FirstFailTest) %>% 
              summarise(Bin_Perc = n()/nrow(df_test_data)) %>%
              ggplot(aes(x = reorder(FirstFailTest,-Bin_Perc), 
                         y = Bin_Perc, fill = FirstFailTest)) + 
              geom_bar(stat = 'identity') + scale_y_continuous(labels = scales::percent) + 
              geom_text(aes(label = scales::percent(Bin_Perc, accuracy = 1)), nudge_y = 0.02) +
              scale_fill_brewer(type = 'qual') + 
              labs(fill = '', x = 'First Failed Test or Pass', 
                   y = 'Bin Percentage', title = 'Failure Bin Category by First Failed Test')

  return(pareto)
}
plot_yield_pareto(df_limts, df_baseline)

So far, we have figured out T01_RES, and T03_IOFF were our biggest yield loss factors, but we have not yet looked at the individual measurement data. In the following section, we look at each individual test and how it is distributed around the wafer, for T01 to T06.

Test data Box-plots and Wafer Maps

The following plots show box-plots for each wafer and each test, along with the wafers maps showing how the data is distributed around the wafer. The upper and lower limits for each test (if applicable), are also shown as the red-dashed-lines.

From the figures below we can observe that:

  • T01 and T02 show a center to edge increase in the results.
  • T01 has a large number of points above the test limit, which is causing this test to be the major yield loss factor.
  • The dies with high T01 values are near the edges, while the dies with high T03 values are more randomly distributed.
# changing the wafer variable from numeric to factor for better formatting
df_baseline$Wafer <- as.factor(df_baseline$Wafer)

# loop through the test data columns and generate one plot for each test
for(col in grep('T0',colnames(df_baseline),value = T)){
 
  # automatically determine the limits to be used for Y-axis
  # this is to exclude outliers and zoom to a sensible window
  ylim1 <- boxplot.stats(df_baseline[,col], coef = 1.6)$stats[c(1,5)]
  ybox_min <- min(ylim1[1],df_limits[col,'LowerLimit']*0.8, na.rm = T)
  ybox_max <- max(ylim1[2],df_limits[col,'UpperLimit']*1.2, na.rm = T)
  
  pbox <- ggplot(df_baseline, aes_string(y = col, x = 'Wafer', color = 'Wafer')) + 
          geom_boxplot(outlier.alpha = 0) + 
          geom_jitter(alpha = 0.2, size = 2, width = 0.2) + 
          facet_grid(~Wafer, scales = 'free_x', labeller = label_both) +
    
          # apply the automatic Y-axis limits from above 
          coord_cartesian(ylim = c(ybox_min,ybox_max)) + 
    
          # add lines corresponding to the upper and lower test limits 
          geom_hline(yintercept = df_limits[col,'LowerLimit'], color = 'red', linetype = 2) + 
          geom_hline(yintercept = df_limits[col,'UpperLimit'], color = 'red', linetype = 2) + 
    
          # theme changes for better formatting
          theme(axis.text.x = element_blank(), axis.title.x = element_blank(), 
                axis.ticks.x = element_blank())
  
  # currents are better displayed in log-scale on the Y-axis 
  if(grepl('_I',col)){
    pbox <- pbox + scale_y_log10()  
  }
  
  pmap <- ggplot(df_baseline, aes(x = X, y = Y)) + 
          geom_tile(color='gray', aes_string(fill = col)) + 
          facet_grid(~Wafer, scales = 'free_x', labeller = label_both) + 
          scale_fill_gradient(low = 'green', high = 'red', limits = ylim1, oob = squish) +
          theme(strip.text = element_blank(), axis.text.x = element_text(angle = 45, vjust = 0.5))
  
  
  # Print one header per graph (useful when automating reports)
  cat('###  ', col, '\n\n')
  print(ggarrange(pbox, pmap, nrow = 2, heights = c(2,1), align = 'v'))
  cat(' \n\n')
}

T01_RES

T02_VTH

T03_IOFF

T04_IG_3V

T05_IG_4V

T06_IG_5V

Experiment design for yield improvement and mock-data generation

Let us assume that the team came up with two ideas to improve the yield:

Ideally we would do a full-factorial experiment design to ensure we will have enough “Power” to draw useful conclusions from the experiment. But for simplicity, let us assume we tried two additional levels of temperature and one additional level of strain in our experiment.

The following code generates the new data frame with the intended properties using the “generate_wafer_data” function we wrote before.

Like before, let us copy the dataframe from python environment to R environment for convenience.

Here is a table of split conditions for various wafers in our experiment

Wafer Process Temp Strain
1 Baseline Baseline Baseline
2 A1 Higher Baseline
3 A2 Lower Baseline
4 B1 Baseline Higher
5 Baseline Baseline Baseline
6 A1 Higher Baseline
7 A2 Lower Baseline
8 B1 Baseline Higher
9 Baseline Baseline Baseline
10 A1 Higher Baseline
11 A2 Lower Baseline
12 B1 Baseline Higher

Visualization improvements for analyzing data from multiple processes

As we can see, our lot now consists of wafers from 4 different processes, and more efforts are needed to visualize the data effectively.

For example, if we were to look at the test results “T01_RES” without much customization, it is pretty hard to understand which process conditions are better, even though we already took care of basic things like excluding the outliers etc.

To help visualize the data better for this experiment with multiple variables, I did the following customizations to result in more informative visualizations: (some of these features were already implemented in the previous set of box plots).

# Change Wafer column to factor type for better formatting
df_experiment$Wafer <- as.factor(df_experiment$Wafer)

# The columns to be used for grouping / faceting
# This can be customized based on experiment conditions
grouping_cols <- c('Temp','Strain','Process','Wafer')
facet_formula = as.formula(paste("~",paste0(grouping_cols, collapse = '+')))

# Code below generates a tile plot with experiment variables on Y-axis
# and various levels of those variables color coded 
# this tile plot replaces the ribbon / strip associated with the facet information

group_table <- df_experiment %>% group_by(.dots = grouping_cols) %>% summarise()
group_table2 <- group_table
colnames(group_table2) <- paste0(colnames(group_table2),'_tmp1')
group_table <- cbind(group_table, group_table2)
group_table_stack <- melt(group_table,id.vars = grouping_cols, variable.name = 'Split')
group_table_stack$Split <- factor(gsub('_tmp1$','',group_table_stack$Split), 
                                  levels = rev(grouping_cols))
group_table_stack$value2 <- group_table_stack$value
group_table_stack$value2[group_table_stack$Split == 'Wafer'] <- NA
pdoe <- ggplot(group_table_stack, aes(x = Wafer, y = Split, label = value)) + 
          geom_tile(aes(fill = value2), color = 'black') + 
          geom_text(size = 4) +
          facet_grid(facet_formula,scales = 'free_x') + 
          scale_x_discrete(expand = c(0,0)) + 
          theme(legend.key.size = unit(0, 'cm'), 
                legend.text = element_text(size = 1, color = 'white'),
                axis.title.x = element_blank(), axis.text.x = element_blank(), 
                axis.ticks.x = element_blank(),axis.title.y = element_blank(), 
                axis.text.y = element_text(size = 10, vjust = 0.5),
                strip.text = element_blank(), panel.spacing.x = unit(0.05, "lines"), 
                panel.background = element_blank(), plot.margin = margin(0, 0.1, -0.1, 0.1, "cm"), 
                plot.background = element_blank())  + 
         labs(fill = '')  + 
         scale_fill_manual(values = get_palette('Pastel1',length(unique(group_table_stack$value2)))) 

# Loop through the test data columns and generate one plot for each test
for(col in grep('T0',colnames(df_experiment),value = T)){

  # Automatically determine the limits to be used for Y-axis
  # This is to exclude outliers and zoom to a sensible window
  
  ylim1 <- boxplot.stats(df_experiment[,col], coef = 1.6)$stats[c(1,5)]
  ybox_min <- min(ylim1[1],df_limits[col,'LowerLimit']*0.8, na.rm = T)
  ybox_max <- max(ylim1[2],df_limits[col,'UpperLimit']*1.2, na.rm = T)
  
  pbox <- ggplot(df_experiment, aes_string(y = col, x = 'Wafer', color = 'Process')) + 
            geom_boxplot(outlier.alpha = 0) + 
            
            # apply the automatic Y-axis limits from above             
            coord_cartesian(ylim = c(ybox_min,ybox_max)) + 
            geom_jitter(alpha = 0.2, size = 2, width = 0.2) +
    
            # faceting / grouping based on the chosen categories
            facet_grid(facet_formula,scales = 'free_x') +  
    
            # add lines corresponding to the upper and lower test limits    
            geom_hline(yintercept = df_limits[col,'LowerLimit'], color = 'red', linetype = 2) + 
            geom_hline(yintercept = df_limits[col,'UpperLimit'], color = 'red', linetype = 2) + 
            
            # theme settings for better formatting
            theme(strip.text = element_blank(), 
                  axis.text.x = element_blank(), axis.title.x = element_blank(), 
                  axis.text.y = element_text(size = 10),
                  panel.spacing.x = unit(0.05,"lines"), panel.border = element_rect(color = 'gray', fill = NA, size = 0.5),
                  plot.margin = margin(-0.1, 0.1, -0.1, 0.1, 'cm'),
                  axis.ticks.x = element_blank())
  
  # currents are better displayed in log-scale on Y-axis
  if(grepl('_I',col)){
    pbox <- pbox + scale_y_log10()  
  }
  
  pmap <- ggplot(df_experiment, aes(x = X, y = Y)) + geom_tile(color='gray', aes_string(fill = col)) + 
          facet_grid(facet_formula,scales = 'free_x') + 
          scale_fill_gradient(low = 'green', high = 'red', limits = ylim1, oob = squish) +
          scale_x_continuous(breaks = c(-50,0,50)) + 
          theme(strip.text = element_blank(), 
                #axis.text.x = element_text(angle = 270, hjust = 0),
                plot.margin = margin(0, 0.1, 0, 0.1, 'cm'),
                panel.spacing.x = unit(0.05,"lines"), panel.border = element_rect(color = 'gray', fill = NA, size = 0.5))
  
  # print one header per graph (useful when automating reports)
  cat('###  ', col, '\n\n')
  
  # Use ggrrange to combine multiple sections of the plot
  print(ggarrange(pdoe,pbox, pmap, nrow = 3, heights = c(2,2.5,1.5), align = 'v'))
  
  cat(' \n\n')
}

T01_RES

T02_VTH

T03_IOFF

T04_IG_3V

T05_IG_4V

T06_IG_5V

Observations from the experiment

From the above we data, we can see the following:

  • Processes “A1” and “B1” have significantly improved the T01_RES distributions
  • However process “B1” seems to have degraded the “T03_IOFF” distributions by moving them further beyond the test limit
  • Process “A2” seems to have poor results for T02 and T01.
  • Overall, process “A1” seems to be our best bet for yield improvement

We can notice the same in the yield by wafer/process tile-plot below, where the last row indicates that the wafers with Process-A1 have the best yields (> 75%)

# The function below generates a tile plot of yield by category for each wafer
# taking the test data and test limits as inputs
plot_yield_tile2 <- function(df_limits, df_test_data){
  
  # Generate a separate Pass/Fail column for each test
  for(i in 1:nrow(df_limits)){
    df_test_data[paste0('Pass_',df_limits$TestName[i])] <- (
      (is.na(df_limits$UpperLimit[i]) | 
         (df_test_data[paste0(df_limits$TestName[i])] < df_limits$UpperLimit[i]))  & 
      (is.na(df_limits$LowerLimit[i]) | 
         (df_test_data[paste0(df_limits$TestName[i])] > df_limits$LowerLimit[i]))
      )
  }
  
  # Calculate yield grouped by category and wafer
  df_category_yield <- df_test_data[c(grouping_cols,
                                      grep('Pass',colnames(df_test_data),value = T))] %>% 
                          melt(id.vars = c(grouping_cols), variable.name = 'Category') %>%
                          group_by(.dots = c(grouping_cols,'Category')) %>% 
                          summarise(Yield_Percent = sum(value)/n(), NumDies = n())
  
  # Cleaning up the category names
  df_category_yield$Category <- gsub('Pass$','Final_Yield',df_category_yield$Category)
  
  # Generating the tile plot
  pyield <- ggplot(df_category_yield, aes(x = Wafer, y = Category, fill = Yield_Percent)) + 
            geom_tile(color = 'black') + 
            facet_grid(facet_formula, scales = 'free_x') + 
            geom_text(aes(label = scales::percent(Yield_Percent, accuracy = 1)), size = 4) + 
            scale_fill_gradient(low = 'red', high = 'green', 
                                limits = c(0.2,1), labels = scales::percent) + 
            theme(strip.text = element_blank(), 
                  axis.text.x = element_blank(), axis.title.x = element_blank(), 
                  axis.text.y = element_text(size = 10),
                  panel.spacing.x = unit(0.05,"lines"), panel.border = element_rect(color = 'gray', fill = NA, size = 0.5),
                  plot.margin = margin(-0.1, 0.1, 0, 0.1, 'cm'),
                  axis.ticks.x = element_blank())
  
  # Use ggrrange to combine multiple sections of the plot
  return(ggarrange(pdoe,pyield, nrow = 2, heights = c(1,2),align = 'v'))
}

plot_yield_tile2(df_limits,df_experiment)

Statistical verification of process improvement

So far we have qualitatively observed that the Process-A1 improved parameters, but ideally we would like to use a more scientific way of determining if Process-A1 made a significant difference.

We can use an independent t-test to check if the means of two processes are statistically different for each test and the yield (column: “Pass”) using the following code.

Using p-value < 0.05, as the criterion for statistical significance of difference in means, we can see that Process-A1 caused statistically significant changes in T01_RES, and T02_VTH, and the final Pass/Fail column (“Pass”), and it didn’t affect the other tests such as T03_IOFF, T06_IG_5V etc.

These results are also consistent with the yield improvement observations from above.

## $T01_RES
## 
##  Welch Two Sample t-test
## 
## data:  x by df_Baseline_and_ProcessA1$Process
## t = -37.991, df = 4223.6, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.314052 -8.399915
## sample estimates:
##       mean in group A1 mean in group Baseline 
##               85.40469               94.26167 
## 
## 
## $T02_VTH
## 
##  Welch Two Sample t-test
## 
## data:  x by df_Baseline_and_ProcessA1$Process
## t = -34.3, df = 4408.3, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.07438112 -0.06633796
## sample estimates:
##       mean in group A1 mean in group Baseline 
##              0.7876653              0.8580249 
## 
## 
## $T03_IOFF
## 
##  Welch Two Sample t-test
## 
## data:  x by df_Baseline_and_ProcessA1$Process
## t = 0.19765, df = 5079.1, p-value = 0.8433
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.848585e-09  5.935840e-09
## sample estimates:
##       mean in group A1 mean in group Baseline 
##           6.108968e-08           6.054606e-08 
## 
## 
## $T04_IG_3V
## 
##  Welch Two Sample t-test
## 
## data:  x by df_Baseline_and_ProcessA1$Process
## t = -0.45963, df = 5079.2, p-value = 0.6458
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.174604e-07  7.284301e-08
## sample estimates:
##       mean in group A1 mean in group Baseline 
##           1.537378e-06           1.559687e-06 
## 
## 
## $T05_IG_4V
## 
##  Welch Two Sample t-test
## 
## data:  x by df_Baseline_and_ProcessA1$Process
## t = -1.0209, df = 5050.3, p-value = 0.3074
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.375054e-06  4.333390e-07
## sample estimates:
##       mean in group A1 mean in group Baseline 
##           1.486183e-05           1.533269e-05 
## 
## 
## $T06_IG_5V
## 
##  Welch Two Sample t-test
## 
## data:  x by df_Baseline_and_ProcessA1$Process
## t = -0.9414, df = 4987.5, p-value = 0.3465
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.346216e-05  4.727513e-06
## sample estimates:
##       mean in group A1 mean in group Baseline 
##           0.0001490913           0.0001534586 
## 
## 
## $Pass
## 
##  Welch Two Sample t-test
## 
## data:  x by df_Baseline_and_ProcessA1$Process
## t = 17.873, df = 4848.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2011285 0.2506857
## sample estimates:
##       mean in group A1 mean in group Baseline 
##              0.7987495              0.5728424

Conclusions

In summary, this example demonstrates various data analysis, data visualization, and hypothesis testing techniques applied to a sample semiconductor test data-set.